Genetic characterization and phylogenetic analysis of the Nigella sativa (black seed) plastome

In this study, the complete plastome sequence of Nigella sativa (black seed), was analyzed for the first time. The plastome spans approximately 154,120 bp, comprising four sections: the Large Single-Copy (LSC) (85,538 bp), the Small Single-Copy (SSC) (17,984 bp), and two Inverted Repeat (IR) regions (25,299 bp). A comparative study of N. sativa’s plastome with ten other species from various genera in the Ranunculaceae family reveals substantial structural variations. The contraction of the inverted repeat region in N. sativa influences the boundaries of single-copy regions, resulting in a shorter plastome size than other species. When comparing the plastome of N. sativa with those of its related species, significant divergence is observed, particularly except for N. damascena. Among these, the plastome of A. glaucifolium displays the highest average pairwise sequence divergence (0.2851) with N. sativa, followed by A. raddeana (0.2290) and A. coerulea (0.1222). Furthermore, the study identified 12 distinct hotspot regions characterized by elevated Pi values (> 0.1). These regions include trnH-GUG-psbA, matK-trnQ-UUG, psbK-trnR-UCU, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, petA-psaJ, trnN-GUU-ndhF, trnV-GAC-rps12, ycf2-trnI-CAU, and ndhA-ycf1. Approximately, 24 tandem and 48 palindromic and forward repeats were detected in N. sativa plastome. The analysis revealed 32 microsatellites with the majority being mononucleotide repeats. In the N. sativa plastome, phenylalanine had the highest number of codons (1982 codons), while alanine was the least common amino acid with 260 codons. A phylogenetic tree, constructed using protein-coding genes, revealed a distinct monophyletic clade comprising N. sativa and N. damascene, closely aligned with the Cimicifugeae tribe and exhibiting robust support. This plastome provides valuable genetic information for precise species identification, phylogenetic resolution, and evolutionary studies of N. sativa.


General features and composition of plastome
This research investigates the plastome structure of N. sativa and compares it with the plastomes of ten additional species within the Ranunculaceae family.The complete plastome of N. sativa exhibits a quadripartite structure, consistent with the typical organization found in most land plant plastomes (Fig. 1).The plastome of N. sativa is approximately 154,120 bp in size and is divided into four main sections.These include the LSC region, which spans 85,538 bp, the SSC region covering 17,984 bp, and two IR regions with a total size of 25,299 bp.In this study, the plastome of P. anemonoides emerged as the largest, spanning a length of 164,383 bp, whereas the plastome of N. sativa was identified as the shortest among the 11 selected plastomes.The plastome of N. sativa contains a total of 128 genes, consisting of 83 genes for encoding proteins, 37 genes for transfer RNA (tRNA), and eight genes for ribosomal RNA (Table 1).The gene count for this organism is the most minimal among all plastomes, with A. coerulea displaying a larger total of 140 genes.There is variability in the number of protein-coding genes across the studied species, ranging from 81 to 94. Notably, N. sativa possesses a total of 83 protein-coding genes.Upon examining all species in the study, it is evident that A. glaucifolium boasts the highest number of proteincoding genes (PCGs), while A. coerulea exhibits the lowest count of PCGs.Within the plastome of N. sativa, 11 genes (rps11, rps12, rps14, rps15, rps18, rps19, rps2, rps3, rps4, rps7 and rps8) encode for small ribosomal subunits, while another set of eight genes (rpl14, rpl16, rpl2, rpl20, rpl22, rpl23, rpl33 and rpl36) encode for large ribosomal subunits.Furthermore, there are 45 genes associated with proteins related to photosynthesis, and an additional four genes (rpoA, rpoB, rpoC1, and rpoC2) are involved in encoding DNA-dependent RNA polymerase.Lastly, nine genes (accD, ccsA, cemA, matK, clpP, infA, ycf1, ycf2, and ycf4) are associated with the encoding of other proteins, as outlined in Table 2.The tRNA gene count ranges from 36 (in A. glaucifolium and A. raddeana) to 45 (in A. coerulea), while the rRNA gene count remains constant at 8 across all plastomes.We found 11 intron-containing genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, and rpoC1) in N. sativa plastome, eight of which contained single intron, whereas three genes (clpP, rps12 and ycf3) have two introns each (Table 3).The GC content of the plastome among the 11 species was generally similar, with N. sativa exhibiting a GC percentage of approximately 38%.In contrast, A. coerulea displayed a higher GC content of 39% across all the plastomes examined.In examining PCG length in N. sativa plastome, we found a length of 76,339 bp.Comparative analysis across species revealed diverse PCG lengths, ranging from 75,870 bp (N.damascene) to 84,105 bp (A.glaucifolium).Additionally, IR lengths in plastomes varied from 31,279 bp (A.raddeana) to 25,162 bp (N.damascene), indicating a positive correlation between overall plastome length and IR size across species (Table 1).We examined the codon usage frequency of protein-coding genes in the N. sativa plastome; phenylalanine had the most codons (1982 codons), then Lysine (1912 codons), while Alanine was the least common amino acid (260 codons).Of the total codons analyzed, 35 exhibited a relative synonymous codon usage (RSCU) greater than 1 in the N. sativa plastome.The most favored codon was AGA, encoding arginine, with an RSCU value of 1.78.Following closely, CAU, which encodes histidine, had an RSCU value of 1.44 (Table S1).

Comparative analysis and divergence
The mVISTA analysis uncovered sequence variability among 11 plastomes.In our results, the coding regions displayed comparatively low sequence divergence, while more significant divergence was observed in the noncoding regions.The results of the analysis revealed a noteworthy resemblance between N. damascena and N. sativa in comparison to other species.However, a distinctive pattern of divergence was observed in the region spanning from trnL to ycf1, particularly in the SSC region, as illustrated in Fig. 2. The analysis of various species revealed a variable number of divergences, with a notable pattern observed across different genomic regions.
The most substantial divergences were identified within the LSC region, with A. raddeana and A. glaucifolium.Noteworthy divergences were also observed in other species, especially across the psbA to the atpH, rpoB to the trnT, and ycf3 to the ndhJ regions.A striking divergence pattern was also evident in A. coerulea, exhibiting significant distinctions, especially within the rbcL to clpP region in the LSC position.In the SSC region, all plastomes exhibited pronounced divergences compared to N. sativa (Fig. 2).High divergence was noted from ndhF to ycf1, with A. glaucifolium showcasing a particularly significant divergence.Contrastingly, the IR region displayed relatively lower levels of divergence compared to the LSC and SSC regions.The ycf2 gene, however, demonstrated  The average pairwise sequence divergence was also calculated for the complete plastome and protein coding genes.A. glaucifolium's plastome displayed the highest average pairwise sequence divergence (0.2851) with N. sativa, followed by A. raddeana (0.2290) and A. coerulea (0.1222).In contrast, N. damascena exhibited a low pairwise sequence divergence of 0.0117 with N. sativa (Table S1 and Fig. 3).Analysis of protein-coding gene divergence in selected plastomes reveals a distinct pattern, depicted in a heatmap.Notably, the ycf1 gene exhibits significant divergence compared to N. sativa, with other divergent genes including rpl14, rpl16, rpl20, ccsA, cemA, matK, psbT, ndhA, and ndhF across all species, except N. damascene, which resembles N. sativa.The highest pairwise sequence divergence is observed in ycf1 at 0.2283.This study provides valuable insights into the evolutionary dynamics and genetic divergence among these species.

IR expansion and contraction
To explore the potential expansion and contraction of IRs, the distributions of IR and SC border regions in the plastomes of 11 taxa within the family Ranunculaceae were compared.The rps19 gene, present in all species except A. raddeana, A. glaucifolium, P. anemonoides, and A. coerulea, exhibited an unusual behavior by crossing the boundary between the LSC and IRb regions.Notably, the rpl22 gene consistently resided in the LSC region across species, except for A. raddeana, A. glaucifolium, and A. coerulea, where it was absent (Fig. 5).Additionally, the typical placement of the rpl2 gene in the IRb region shifted to the LSC region in A. coerulea.The ycf1 gene in A. glaucifolium fully overlaps the JSB boundary, while across all species, it spans the JSA boundary, predominantly    5).This analysis highlights distinct plastome patterns among species.Structural variations in the IR and SSC regions can lead to gene rearrangements 40,41 .In this study, the lengths of IR regions were extended in P. anemonoides (30,979 bp), A. glaucifolium (31,256 bp), and A. raddeana (31,279 bp).This extension may contribute to the comparatively larger plastome sizes observed in these species compared to the IR region lengths of N. sativa (25,299 bp) and N. damascena (25,162 bp.Contraction and extension were identified in IR and SSC regions across all studied species.Additionally, in species such as A. coerulea, which has an extended plastome, there is an observable extension in the LSC region (Fig. 5).

Repeat and SSR analysis
The number of repeats identified in all selected species ranges from 46 to 50, encompassing 16 to 28 palindromic repeats, 17 to 26 forward repeats, and 0 to 15 reverse repeats (Fig. 6).In N. sativa, the total repeats are 48, including 23 palindromic repeats and 25 forward repeats, with no reverse repeats observed.Across the selected species, all repeat types are predominantly about 18-30 bp in length (Fig. 6).Tandem repeats vary from 14 to 49 in all species, most falling within the 11-20 bp range.Specifically, N. sativa exhibits 24 tandem repeats (Fig. 6C).The SSR analysis of 11 plastomes revealed diversity in microsatellite counts, notably, N. sativa displayed 32 repeats, predominantly consisting of mononucleotide repeats.Additionally, some di-and trinucleotide repeats are present in the SSR analysis.P. anemonoides exhibits the highest number of SSRs among all species, totaling 65 (Fig. 7A).
The predominant type of SSRs across all plastomes were mononucleotide repeats, followed by dinucleotide and trinucleotide repeats.However, tetranucleotide, pentanucleotide, and hexanucleotide repeats were absent in all plastomes.A and T repeats constitute a more significant proportion of mononucleotide repeats than G and C repeats.Similarly, in dinucleotide repeats, the AT content represents a more significant proportion than the GC content (Fig. 7B).

Phylogenetic analysis
This study inferred phylogenetic relationships within Ranunculaceae from 73 shared protein coding genes.The Glaucidioideae, Hydrastidoideae, and Coptidoideae emerged as the earliest divergent lineages within the Ranunculaceae family in our study.In our current study, the analysis of plastid phylogenomics revealed a wellsupported sister relationship between subfamilies Talictroideae and tribe Adonideae, with a strong bootstrap value of 95.The tribe Asteropyreae and Caltheae were observed to form the same clade in our study, but the support for this grouping is relatively low, with a bootstrap value of 44.Our analysis in Ranunculoideae successfully resolved the sister relationship between the tribes Anemoneae and Ranunculeae, with a robust bootstrap support value of 100 (Fig. 8).In our study, we observed that the position of Nigelleae is situated between Callianthemum and Cimicifugeae based on the protein coding genes data set.This tribe demonstrated its closest relationship with Cimicifugeae, a connection supported by a robust bootstrap value of 100.The phylogenetic trees strongly indicate that N. sativa is most closely related to N. damascene, which belongs to the genus Nigella and forms the same clade.

Discussion
In recent years, the plastome has frequently been employed as a DNA super barcode for the identification, classification, and phylogenetic research of medicinal plants 42,43 .In this study, we utilized next-generation sequencing to sequence the first complete plastome of N. sativa.The observed quadripartite structure is consistent with the typical organization found in the majority of plastomes of land plants 22,44 .The plastome sizes exhibited a range, with N. sativa having a size of 154,120 bp and P. anemonoides displaying the largest size at 164,383 bp (Table 1).These findings align with previous studies indicating size variation among plastomes from different genera within the Ranunculaceae family.The plastome sizes in Aquilegia, Delphinium, and Ranunculus have been estimated at 151 kb, 149 kb, and 157 kb, respectively 45 .Earlier studies on different angiosperm groups have indicated that plastome can be conserved 46 or highly polymorphic 47,48 .Currently, the comparison of 11 plastomes from various genera in the Ranunculaceae family has shown significant variation in plastome structure.Our research aligns with earlier research that found structural variation in Clematis, opposing the assumption of conserved characteristic structures in the plastome 49,50 .In the present study, we observed significant divergence in the plastome and gene order among genomes such as A. raddeana, A. glaucifolium, and A. coerulea compared to N. sativa.The most notable distinction from the plastome of N. sativa involved a substantial inversion of 36 kb.It was identified between ycf3 to atpA genes (LSC region) in the plastome of A. raddeana and A. glaucifolium, and another inversion of 19 kb was observed in between the ycf1 and ndhF genes (SSC region) in the latter species (Fig. 4).Similarly, an inversion of about 22 kb was detected in the plastome of A. coerulea between the atpB to clpP2 gene in the large single-copy (LSC) region.Furthermore, we observed several smaller inversions, shifts in genes, and rearrangements in the plastome of these species.However, the other species including N. sativa lake inversions and transpositions in their plastome.Our findings are consistent with the research conducted by 39 , indicating that Clematis has undergone four rearrangements compared to Coptis.Coptis, an ancestral condition in Ranunculaceae, exhibits a typical chloroplast structure.Similarly, minor changes were documented in the family Orchidaceae, specifically involving the inversion of the petN-psbM region 51 .In contrast, gymnosperms belonging to the Pinaceae family exhibited a distinct pattern with five different plastome structures 52 .The identification of inversion and transposition events in the plastome of A. raddeana, A. coerulea, and A. glaucifolium is consistent with prior research indicating that the occurrence of structural rearrangements in plastome varies within the family.Previous studies have reported the presence of inversions in genera such as Anemone, Adonis, and Clematis 53 .Besides, the work of 54 is in line with our study that within Ranunculeae species, the plastome gene orders align with those of numerous other genera (e.g., Aconitum, Thalictrum), and no occurrences of gene inversions or translocations have been observed.Plastome sequences among family Ranunculaceae species show significant genetic divergence, as documented in prior research 55 .Aligned sequences indicate substantial differentiation, particularly in noncoding regions and SSC and LSC regions.Nucleotide diversity (PI) shows the extent of variation in DNA sequences, providing insights into the genetic diversity within a species 56 .Nucleotide diversity (PI) values were higher in the chloroplast genes of N. sativa and its related species within the LSC and SSC regions compared to the IR region.This observation is consistent with findings in other angiosperms 57,58 .Our findings indicate that the plastome of N. sativa exhibits a high degree of sequence similarity with N. damascena species because both belong to the same genus.Nevertheless, there are regions where the identity is relatively lower in comparison.In contrast, the other nine plastomes display substantial sequence divergence from N. sativa.We compared the N. sativa plastome with seven other sequenced species, excluding A. raddeana and A. glaucifolium, due to their higher divergence.Through sliding window analysis, we identified 12 divergent hotspot regions, including trnH-GUG-psbA (0.12), matK-trnQ-UUG (0.13), psbK-trnR-UCU (0.1), atpF-atpI (0.12), rpoB-psbD (0.19), ycf3-ndhJ (0.22), ndhC-cemA (0.31), petA-psaJ (0.24), trnN-GUU-ndhF (0.23), trnV-GAC-rps12 (0.17), and ycf2-trnI-CAU (0.092) and ndhA-ycf1 (0.27).The significantly divergent regions identified here offer valuable insights for developing molecular markers in plant identification and exploring phylogenetic relationships of N. sativa and related species.The detection of these positively selected sites such as atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, and petA-psaJ suggests that these regions have undergone adaptations to environmental stressors 59 .The identification and classification of Ranunculaceae species are crucial for understanding their evolutionary relationships and ecological roles 59,60 .The previous research revealed that the combination of markers such as ndhC-trnV-UAC, psbE-petL, rps8-rpl14, petN-psbM, atpF-atpI, trnT-GGU-psbD, rpl32-trnL-UAG, rpl16-rps3, rps16-trnQ-UUG, ndhG-ndhI, accD-psaI, trnG-GCC-trnfM-CAU, trnT-UGU-trnL-UAA, psbZ-trnG-GCC, and trnK-UUU-rps16 resulted in a 100% species identification rate, which is significantly higher than the rates achieved by individual markers [59][60][61][62] .The study also revealed that the use of combination markers can identify seven-fold more variant sites than Vol:.(1234567890  regions in seven species of Pulsatilla (Ranunculaceae) were identified previously, including six intergenic spacer regions (rps4-rps16, rps16-matK, ndhC-trnV, psbE-petL, ndhD-ccsA and ccsA-ndhF) and four protein-coding regions (ycf1, ndhF and ndhI) 60 .These findings underscore the value of using multiple markers to account for the varying rates of nucleotide variation across different loci.The use of these combined markers can be particularly advantageous for identifying closely related species, where individual markers may not be sufficient to distinguish between them.The most effective multi-locus barcode for identifying Pulsatilla species from the Ranunculaceae family was found to be cpDNA barcodes like rbcL, matK and trnH-psbA in earlier research 60 .Furthermore, ycf1 gene was also found the most efficient barcode in Aconitum species identification 61 .Additionally, our findings indicate that Angiosperms tend to accumulate variations at the genus level in the LSC and SSC regions of the plastome.This pattern is consistent with the distribution of variations reported in the plastomes of other genera, such as Cymbidium, Oenothera, and Pyrus 63 .Moreover, the observed distribution of divergence regions, predominantly in the LSC and SSC regions, aligns with previous reports on Chaenomeles and Lancea species 64,65 .Previously, five types of plastome were identified based on distinctions in the LSC region.N. damascena (Type I) represents an ancestral condition.A. raddeana and A. glaucifolium exhibit the second type (Type II) with a unique gene arrangement pattern involving inversions.Likewise, A. coerulea (Type V) features an inversion between accD and clpP1, distinguishing it from Type I chloroplast genomes.In the Ranunculaceae, the Type I plastome is considered the most primitive.According to 39 , all other types have originated from Type I through the inversion of different genes.
The concept of codon usage bias (CUB) refers to the differential frequency with which various synonymous codons encoding the same amino acid are observed in the coding sequences of a given organism's genome 48 .CUB preferences are specific to different genes in different species and can even vary within a particular species.This variability is shaped by a combination of factors, including mutation, selection, and genetic drift, which act during the long-term evolution of genes and species 66 .In our study, we examined the codon usage frequency of protein-coding genes in the N. sativa plastome, among all phenylalanine had the highest codons (1982).Additionally, 35 codons analyzed exhibited a relative synonymous codon usage (RSCU) greater than 1 while the most favored codon was AGA, encoding arginine, with an RSCU value of 1.78.
The plastome of higher plants is known for its high degree of conservation.However, variations in genome length between species do arise due to the dynamic processes of extension and contraction occurring in the IR, LSC, and SSC regions [67][68][69][70][71] .Throughout plastome evolution, the IR region undergoes dynamic changes involving expansion and contraction, with genes entering either the IR region or the LSC and SSC regions 72 .We thoroughly compared 11 species, examining the two IRs and the two single-copy regions.In N. sativa, a notable contraction was observed in the IRs, while only a slight expansion was noted in the SSC region due to the shifting of rpl2 and ycf1 genes, leading to a shortened plastome length (Fig. 7).On the contrary, in P. anemonoides, there is an extension in the IR region.The larger genome size of this species might be due to the rps19 gene entering the junction of the LSC and IR borders, and 107 bp appeared in the IR region and was duplicated.Similarly, A. raddeana and A. glaucifolium exhibit expanded IR regions with placed genes infA, rps8, rpl2, ycf1, and rpl36 extending to the JLB Junction.Additionally, rps11 and rps4 genes are situated in the LSC region, contributing to increased genome size.The expanded genome size in A. coerulea results from LSC region enlargement, while SSC and IR regions simultaneously contract.This aligns with previous research indicating significant structural changes in land plant plastomes, including IR region loss or specific gene families 73 .The events of expansion and contraction in IRs are crucial in evolution as they can lead to alterations in gene content and plastome size 47,74 .The expansion of IRs has been documented in Araceae 74,75 .In certain cases, the LSC region expands while the SSC region decreases, reaching a size of only 7000 bp in Pothos 76 .The expansion and contraction of IR regions can result in the duplication or conversion of certain genes from duplicate to a single copy, respectively 47,74 .Modifications in IR size can also prompt rearrangements of genes in the SSC region, as recently observed in Zantedeschia 74 .
Long repeats are crucial contributors to the complete plastome's variation, expansion, and rearrangement 77 .N. sativa was found to have approximately 48 long repeats.In comparison, the long repeats in these plastomes ranged from 46 (A.coerulea) to 50 (A.raddeana, A. macrophylla, A. angustius).The SSRs and long repeats in the 11 plastomes showed considerable variation.SSRs were mainly present in the non-coding region, and their sequence variation was higher compared to the coding region 78 .Additionally, SSRs can be employed for studying conservation genetics in endangered plant species, molecular identification, and exploring genetic relationships among related species 79,80 .The analysis of SSRs in the plastome of N. sativa revealed variations in the number of SSRs among 11 species, ranging from 24 (A.raddeana) to 65 (P.anemonoides).Mononucleotide repeats are the most common, followed by dinucleotide repeats, and the prevalent motifs across all species are A and T. Our results align with previous reports indicating that mononucleotide and dinucleotide repeats were the most and second most abundant SSRs in the plastomes of two Caldesia species 81 .Additionally, our findings are in line with earlier research suggesting that SSRs in plastome predominantly consist of polythymine (polyT) or polyadenine (polyA) repeats and less frequently contain tandem cytosine (C) and guanine (G) repeats 82 .This consistency supports the previous observation that plastome SSRs are primarily dominated by ' A' or 'T' mononucleotide repeats 83,84 .
The current classification of Ranunculaceae, as proposed by 85 , relies on a comprehensive analysis that combines both morphological and molecular phylogenetic data.This classification results from examining 6957 molecular characters and 65 morphological characters.In this proposed classification, Ranunculaceae is categorized into five monophyletic subfamilies: Glaucidioideae, Hydrastidoideae, Coptidoideae, Thalictroideae, and Ranunculoideae.The Ranunculoideae subfamily is further subdivided into ten strongly supported monophyletic tribes.The findings of our study align with previous research, supporting Glaucidium as the first diverging taxon and sister to all other Ranunculaceae species [85][86][87] .Our results are consistent with the findings of 85 , indicating that Hydrastis is the second diverging taxon with robust support, and Coptidoideae represents the third diverging clade.In earlier studies, the position of Nigelleae within the Ranunculaceae family has been inconsistent.www.nature.com/scientificreports/However, a previous analysis of plastomes from 38 Ranunculaceae species found that Nigelleae is closely related to Delphineae.This relationship was strongly supported by a bootstrap value (100), providing robust evidence for the clustering of Nigelleae and Delphineae in the same clade 88 .Furthermore, based on 77 protein-coding genes and four rRNA genes, the analysis revealed that Caltheae is the sister group to Asteropyreae.In turn, Asteropyreae is identified as the sister group to the combined clade of Caltheae, Delphinieae, and Nigelleae 39 .Nevertheless, our findings align with the research conducted by 89,90 , where they identified Nigellaea as the sister group to Cimicifugeae.Similar results about Nigelleae were reported previously 91 .Furthermore, in line with our study, they also identified the sister relationship between the subfamilies Talictroideae and Adonideae.Moreover, in our research, the strongest supported grouping (with a bootstrap value of 100) among tribes of Ranunculoideae is the sister group relationship between Anemoneae and Ranunculeae.This finding is consistent with results from previous studies, providing additional confirmation to the observed relationship between these two tribes 85,[92][93][94] .
The data obtained from our study offers valuable insights for future genetic and evolutionary investigations of N. sativa and the broader Ranunculaceae family.

Conclusions
In conclusion, the sequencing and comparative analysis of the complete plastome of N. sativa were conducted for the first time, and the results were compared with those of other related species.The comparison highlighted the conservation of the overall structure in the available complete plastome of N. sativa.However, notable variations were observed in gene order, and certain structural changes were identified, primarily caused by the expansion or contraction of the IR regions into or out of adjacent single-copy regions.The comparative analysis of plastome N. sativa and other studied plants unveiled highly variable regions, including trnH-GUG-psbA, matK-trnQ-UUG, psbK-trnR-UCU, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, petA-psaJ, trnN-GUU, ndhF, trnV-GAC-rps12, and ycf2-trnI-CAU .These regions are identified as fast-evolving loci and show promise as molecular markers in future studies.SSRs and long repeat sequences were identified in terms of number and types, providing potential and effective options for developing molecular markers.The phylogenetic analysis showed that N. sativa forms the same clade as N. damascene with a high bs value (100).However, this tribe is a successive sister to the Cimicifugeae tribe with strong support.The thorough analysis of these complete plastomes contributes valuable insights to conserving medicinal resources, understanding genetic diversity, exploring genome evolution and adaptation history, and investigating the phylogenetic relationships of N. sativa plants.

Materials and methods
The

DNA extraction and sequencing
To extract high-quality DNA from young and immature leaves of N. sativa, we employed a meticulous process.Firstly, the leaves were finely ground into a fine powder using liquid nitrogen.This method ensured that the DNA would be released from the cells effectively.To isolate the DNA, we utilized the highly reliable DNeasy Plant Mini Kit from Qiagen (Valencia, CA, USA).This kit provided us with a robust and efficient method for DNA extraction from plant samples.The kit's protocol was followed carefully to obtain high-quality DNA.Once the DNA successfully isolated, we proceeded to sequence the chloroplast DNA using an Illumina HiSeq-2000 platform at Macrogen (Seoul, Korea).This cutting-edge sequencing platform allowed us to generate a vast number of raw reads for N. sativa, specifically around 578,630,881 raw reads.However, to ensure the reliability and accuracy of our analysis, we needed to filter out low-quality sequences.To achieve this, we implemented a stringent filtering criterion based on a Phred score of less than 30.This quality control step eliminated any reads that did not meet the desired threshold, ensuring that only high-quality sequences were retained for further analysis.To assemble the plastome with precision, we employed two different methods.Firstly, we utilized the GetOrganelle v 1.7.5 pipeline 95 , which is a sophisticated tool specifically designed for plastome assembly.Additionally, we also employed SPAdes version 3.10.1 (http:// bioinf.spbau.ru/ spades) as an assembler to enhance the accuracy and reliability of the assembly process.

Genome annotation
The annotation process of the plastome involved several steps using established tools and software.CpGAVAS2 96 and GeSeq (https:// chlor obox.mpimp-golm.mpg.de/ geseq.html), widely recognized online tools for genome annotation, were utilized to carry out the initial annotation.Additionally, tRNAscan-SE 97 , a well-established program, was employed to identify tRNA genes within the plastomes.To ensure the accuracy of the annotations, a comparative analysis was conducted by comparing the plastomes with reference genomes using Geneious Pro v.10.2.3 98 and tRNAs can-SE (v.1.21) 97.This step allowed for the identification of start and stop codons, determination of intron boundaries, and implementation of manual alterations when necessary.To visualize the structural features of the plastomes, chloroplot, a powerful tool 99 , was used.Furthermore, the genomic divergence was assessed using mVISTA in shuffle-LAGAN mode, with the plastome of N. sativa serving as the reference 55 .In the N. sativa plastome, the average pairwise sequence divergence with ten related species (N.damascena, A. asiatica, A. angustius, A. raddeana, A. coerulea, A. glaucifolium, P. anemonoides, L. fumarioides, D. fargesii and A. macrophylla) was determined.We extensively compared gene order and performed multiple sequence

Figure 1 .
Figure 1 .Plastome genome map of N. sativa.Genes drawn outside the circle are transcribed anti-clockwise, while those inside the circle are transcribed clockwise.Large single copy (LSC) region, inverted repeat (IRA, IRB) regions and small single copy (SSC) region are shown in the figure.The darker green color in the inner circle corresponds to GC content whereas the lighter green corresponds to AT content.Different colors of genes represent their different functions.

Figure 2 .
Figure 2. Alignment visualization of the N. sativa plastome sequences with related species.VISTA-based identity plot showing sequence identity among the 10 species using N. sativa as a reference.The vertical scale indicates percent identity, ranging from 50 to 100%.The horizontal axis indicates the coordinates within the plastome.Arrows indicate the annotated genes and their transcription direction.The thick black lines show the inverted repeats (IRs).

Figure 3 .
Figure 3. Pairwise sequence distance of 73 protein coding genes of N. sativa and related species (A).Nucleotide diversity (Pi) analysis for whole plastomes of N. sativa species.Sliding window length was 200 bp and step size was selected as 100 bp.X-axis: position of the midpoint of a window, Y-axis: nucleotide diversity (Pi) of each window.(B) Sliding window analysis of N. sativa and N. damascena.(C) Sliding window analyses of N. sativa with other 7 species.

Figure 4 .
Figure 4. Synteny plot of N. sativa and ten other plastomes from Ranunculaceae family.The synteny plot shows normal links with chocolate color, inverted link with lime-green color, and gene feature with sky-blue color.

Figure 5 .
Figure 5.Comparison of junctions between the large single-copy (LSC), small single-copy (SSC) and inverted repeat (IR) regions among plastome of N. sativa and other ten plastomes.Boxes above or below the main line indicate the adjacent border genes.The numbers above the gene features indicate the distance between the ends of genes and border sites.

Figure 6 .
Figure 6.Analysis of repeated sequences in N. sativa and other 10 Ranunculaceae plastomes (A), totals numbers of three repeat types (B), number of palindromic repeats by length (C), number of tandem repeats by length (D), number of forward repeats by length (E) and number of reverse repeats by length.

Figure 7 .
Figure 7. Number of different types of SSRs in the plastome of N. sativa and other plastomes (A) and number of SSR motifs (B).

Figure 8 .
Figure 8. Phylogenetic trees were constructed for 75 members of the family Ranunculaceae, representing 11 different genera using different methods, and tree is shown for 73 commonly shared genes data sets constructed by Maximum Likelihood (ML) and Bayesian inference (BI) method.The number above on each node represents the bootstrap value.The red color diamond shape represents the position for N. sativa. https://doi.org/10.1038/s41598-024-65073-6 fresh leaves were collected from N. sativa cultivate in Agriculture Research Center, KPK, Pakistan and transported in liquid nitrogen to the − 80 °C facility.The specimens were submitted to the Agriculture Research Center KP, Pakistan herbarium center under the voucher numbers AGN-NG1 (N.sativa).Dr. Muhammad Waqas one of the leading agronomists at the Agriculture Research Center KPK, Pakistan, identified the plants.The plant samples were collected and processed per the national guidelines and legislation.Hence, a permission permits (NJ334/15/78) was obtained from the Environmental Protection Agency, Khyber Pakhtunkhwa, Pakistan.

Table 1 .
Basic features of the plastome of the N. sativa species and related species.

Table 2 .
List of genes annotated in the plastome of N. sativa.

Table 3 .
The genes with introns in the plastome of N. sativa and the length of exons and introns.